



# Power Macro-modeling for Intellectual-Property (IP) based Multi-core MIPS processor Design

Hamza Javed<sup>1</sup>, Faisal Siddiq<sup>2</sup>, Muhammad Mansoor Ashraf<sup>3</sup>

<sup>1,2,3</sup>Department of Electrical Engineering, University of Engineering and Technology, Taxila (47050), Pakistan <sup>1</sup><u>Hamza.Javed2@students.uettaxila.edu.pk</u> <sup>2</sup><u>faisal.siddiq@uettaxila.edu.pk</u> <sup>3</sup><u>mansoor.ashraf@uettaxila.edu.pk</u>

#### Abstract:

Power consumption is a critical design constraint in electronics systems. it becomes more important in low power systems. An increase in the complexity of system is directly proportional to power consumption. Early power estimation is crucial in multicore design as it mitigates the risk of redesigning. This research proposes an improved model for power macro-modeling, utilizing random values having different statistical properties are generated through the Enhanced Whale Optimization Algorithm (EWOA). A multicore 32-bit MIPS architecture serves as the test system. The model achieves an average percentage error of 7.45% in the multicore system. The results show that power estimation model provides accurate results. The simulated and estimated power results are compared, and result validation is conducted through statistical error analysis. Statistical error analysis validates the accuracy of the proposed power macro-model, making it a reliable technique for early power estimation in complex systems.

#### **Keywords:**

MPSoC, Power Macro-modeling, IP, Multicore, EWOA and Regression

#### 1. Introduction:

Multi-Processor System-on-Chip (MPSoC) technology has undergone remarkable advancements, revolutionizing the integration of complex systems onto a single chip, augmented with multiple cores. This remarkable achievement has sparked innovations across various domains, including smartphones, computers, and smart electronic devices. Analyzing power and performance effectively is crucial for minimizing design duration, especially with the rising complexity of systems[1]. An increase in the number of cores will increase the power consumption. In the field of electronics power performance analysis is an important constraint. Power modeling became more important as the size of semiconductors decreased. High-level synthesis (HLS) uses high-level descriptions which makes hardware design easy to create.





Efficient placements provide optimized area utilization, routing, and power performance[2]. Intellectual property (IP) cores in MPSoC are increasing day by day. The use of many IP cores reduces system performance. Router connecting IP cores play a very important role in performance as it uses 40 % of total power[3].

Routing interconnects all chip components according to design rules, significantly impacting system area and power consumption. Efficient power macro models for IP-based reduces both functional simulations and percentage errors encountered, supported by comprehensive evaluations involving IP module simulations and meticulous analysis of experimental findings[4].

## 2. Literature Review:

It is essential to explore optimal routing patterns that will provide the best throughput. Performance parameters like delay and throughput must be evaluated, and router designs must be updated accordingly. Analytical models take less time than simulators and are very helpful when the number of cores is high[5]. Integration of memory elements and processors in SoCs requires a fast and scalable communication networks[6]. Multiple core stacked system have been acclaimed as a highly promising architecture for high-performance microelectronic devices in the foreseeable future[7].

The impact of fault-tolerant techniques on power consumption is important to avoid limitation of chip power constraints[8]. On-chip thermal management is essential to prevent exceeding the Thermal Design Power (TDP), which can trigger protective measures like voltage and frequency throttling or core power gating. However, such actions may compromise task deadlines and system reliability. Complex system verification is time-consuming, but reusing pre-designed blocks, such as IP blocks, minimizes design time and cost. Designers benefit from using previously validated components in their system design, reducing associated problems with multi-core systems[9].

A power estimation model for a complete IP-based system offers designers the ability to choose an optimal system architecture and optimize power consumption at a higher level of abstraction. The dynamic power estimates are validated by comparing them with existing power models to assess both speed and accuracy[10]. Pre-designed Intellectual Property (IP) blocks, designers can effectively address the challenges associated with multicore designs, reducing both time and resources. These models require statistical insights into input patterns, to provide accurate power consumption forecasts and thermal considerations.

GA and krill herd (KH) techniques for energy-efficient task scheduling in heterogeneous multicore systems. It utilizes a multi-objective fitness function, considering make spans, processor utilization, speedup, and energy consumption, to achieve efficient task scheduling. The results affirm that the technique excels at energy-efficient task scheduling within multi-core system [10]. The Enhanced Whale Optimization Algorithm with Modified Mutualism (WOAmM) is an advanced meta-heuristic method that addresses premature convergence issues. WOAmM strikes a balance between exploration and exploitation, making it more effective at thoroughly searching the solution space without wasting computational resources. WOAmM outperform a variety of algorithms in terms of effectiveness and convergence speed[11]. EWOAmM





generated sets of input parameters, covering the full span of input patterns. The results of algorithm's effectiveness will be validated from statistical analysis of proposed model.

## 3. Method of Analysis:

Verilog Hardware description language (HDL) is used to implement digital system in Xilinx Vivado HLS a High-Level Synthesis (HLS) tool. Then, an optimization technique is performed to find power consumption of a system.

## A. Power Estimation

Power estimation model of IP-based multicore digital design provides output on the basis of various input patterns that have special statistical characteristics[12]. The proposed model is tested on 32-bit multicore MIPS processor given in figure 1.



Figure 1: Multicore MIPS Processor

First of all, the average power is estimated for single IP block and then power calculated by equation (1).

$$P_{core} = \sum_{i=1}^{n} P_{IPavg} \tag{1}$$

 $P_{system}$  is the total power and  $P_{IPavg}$  is the average power of single IP block.

$$P_{Ip_{avg}=f(SP_{in},TD_{in})} \tag{2}$$

Output macro-model function f is obtained by providing inputs produced by Enhanced Whale Optimization Algorithms (EWOA). It uses r primary inputs and binary stream q of length s. Average input signal probability (SP) and transition density is defined by following relations.

$$SP_{in} = \frac{\sum_{i=1}^{r} \sum_{j=1}^{s} q_{ij}}{r * s}$$
(3)

Average input

$$TD_{in} = \frac{\sum_{j=1}^{r} \sum_{i=1}^{s-1} q_{ij} \oplus q_{i+1,j}}{r * (s-1)}$$
(4)





In above equations 4 correlation is used. Correlation is helpful when width of signal pattern is large. Circular convolution represents that number of times that both of the signals are high.

## **B.** Power Macro-Model Validation

Power estimation implemented on Multicore MIPS processor on RTL-level. Input sets are generated from optimization algorithm have TD and SP with statistical characteristics. Linear regression and statistical analysis will compute the model accuracy. The power macro-modeling results are compared with simulation results. We will focus on our problem of statistical power macro-modelling for multicore MIPS 32-bit architecture. First, the average power of all IP blocks estimated individually. Second, the average power for single core is estimated. Finally, the all cores are integrated together to construct a multicore system. Then the average error for a complete test system is estimated.

Average Error is computed by following equation.

$$\varepsilon_{avg} = \frac{1}{N} \sum_{i=1}^{N} \left\{ \frac{|Psimulated - Pestimated|}{Psimulated} \right\}$$
(5)

Where  $P_{simulated}$  is the average power obtained from Vivado HLS and  $P_{estimated}$  is the average power obtained from our proposed macro model given in equation 1, N is the number of simulations for each of the IP block, Single core and Multicore processor[13].

#### C. Whale Optimization Technique

Whale Optimization Technique is an optimization algorithm that draws inspiration from the hunting technique of humpback whales. It leverages this behavior to enhance global optimization and search processes in various problem-solving domains[14].

#### I. Whale Search for Prey

Whales find their target around their location. Following equation shows whale exploration:

$$D^{-} = | C.P(k) rn - P(k)$$
 (6)  
 $P(k+1) = P(k)rn - A.D$  (7)

where P is the position vector of the population, Prn is a vector presenting random population, this iteration is denoted by k, rn is for random, D is denoted as the distance between current and random population., A, and C are calculated as follows:

$$A = 2a1 \times rn - a1 \tag{8}$$

$$C = 2 \times rn \tag{9}$$





Decrease a1 linearly from 2 to 0.

P best is the best location.

$$\overline{D} = | C.P(k) \text{ best} - A.\overline{D}$$
(10)

$$P(k+1) = P(k) best - A.D$$
 (11)

#### II. Bubble-net Attacking Strategy

The spiral bubble-net predation strategy involves simulating a spiral motion to capture the target prey[15]. The whale estimates the distance between itself and the prey, and the location is communicated as follows:

$$D^* = P(k)best - P(k)$$
(12)

$$P^{(k+1)} = D^* e^{bl} . \cos(2\Pi l) . P^{(k)} P^{(K)}_{best}$$
(13)

b is constant, l is random value

$$l = (a2 - 1)rn + 1$$
(14)

#### III. Enhanced whale optimization algorithm

In the Enhanced Whale Optimization Algorithm, evaluating the fitness of the population helps identify the best solution globally. In the modified mutualism phase of each iteration, two random individuals, Pm and Pn, are selected. The process involves choosing the new value of Pi based on the minimum fitness between these two individuals.

 $P^{k+1}_{i} = P^{k}_{i} + rn(0, 1) \times (P_{m} - MV \times BF^{1})$  (15)

$$P^{k+1}_{n} = P^{k}_{n} + rn(0, 1) \times (P_{m} - MV \times BF^{2})$$
 (16)

Otherwise,

$$P^{k+1}_{i} = P^{k}_{i} + rn(0, 1) \times (P_{n} - MV \times BF^{1})$$
 (17)

$$P^{k+1}_{m} = P^{k}_{m} + rn(0, 1) \times (P_{n} - MV \times BF^{2})$$
 (18)

where MV is Mean (Pi, Pn) in the first scenario and Mean (Pi, Pm) on the second. BF (1 and 2) are benefit factors. The fittest will be chosen. This algorithm improves convergence by picking the best random individual in the modified mutualism phase and the global best individual in the local search phase of the Whale Optimization Algorithm with Mutualism (WOAmM). It terminates upon meeting the termination criterion and differs from the original WOA.





#### 4. MODEL ACCURACY ANALYSIS:

Enhanced Whale Optimization Algorithm (EWOA) generates discrete random value sets. These sets are used as an input for test systems. RTL level simulation performed on Xilinx Vivado HLS to get simulated results.

The complete flow chart of power estimation of multicore is giving on figure 2.



Figure 2: Flow Chart

| IP    | P <sub>max</sub> | Pavg   | P <sub>min</sub> | Emax  | Eavg  | Emin  |
|-------|------------------|--------|------------------|-------|-------|-------|
| block | (mW)             | (mW)   | (mW)             | (%)   | (%)   | (%)   |
| IP-1  | 251              | 161.71 | 73.8             | 14.98 | 7.82  | 0.34  |
| IP-2  | 9.4              | 5.29   | 2                | 18.70 | 6.56  | 0.85  |
| IP-3  | 26.6             | 13.76  | 2                | 6.09  | 3.81  | 0.53  |
| IP-4  | 289.2            | 236.68 | 186.9            | 14.74 | 6.92  | 0.052 |
| IP-5  | 179              | 144.41 | 117.2            | 19.20 | 6.95  | 0.74  |
| IP-6  | 215.20           | 133.48 | 64.8             | 28.13 | 10.18 | 0.73  |

Table 1. Power and error of IPs

Power of each individual IP blocks is mentioned in Table.1 we take Twenty-five random sets for the validation of accuracy of our power macro-model. A low-power multicore system has been developed and tested at a frequency of 100MHz. Simulations to assess its performance were conducted on a Lenovo laptop equipped with an Intel Core i5 processor clocked at 2.42 GHz, 8 GB of RAM, and a 64-bit Windows 10 operating system. This configuration renders the system suitable for a range of low-power commercial applications.





| System  | P <sub>max</sub> | Pavg   | P <sub>min</sub> | Emax  | Eavg | Emin |
|---------|------------------|--------|------------------|-------|------|------|
|         | (mW)             | (mW)   | (mW)             | (%)   | (%)  | (%)  |
| 1-CORES | 328              | 279.47 | 225.6            | 15.55 | 6.09 | 0.29 |
| MIPS    |                  |        |                  |       |      |      |

 Table 2. Power and error of single core MIPS

 Table 3. Power and error of multicore MIPS

| System  | P <sub>max</sub> | Pavg   | P <sub>min</sub> | Emax  | Eavg | Emin |
|---------|------------------|--------|------------------|-------|------|------|
|         | (mW)             | (mW)   | (mW)             | (%)   | (%)  | (%)  |
| 4-CORES | 1514             | 1253.8 | 869              | 16.82 | 7.45 | 1.15 |
| MIPS    |                  |        |                  |       |      |      |

Power and error of single core of MIPS architecture is given in table 2. This power is obtained after joining all IPs together. After that four cores are combine together to get power and error. These are homogeneous cores. Power and error of multicore mentioned in table 3.

# **5. REGRESSION ANALYSIS OF MIPS:**

Regression model find the relation between the power and input. Results are added in following equation.

$$Power = \beta 0 + \beta 1.TD + \beta 2.SP$$
(19)

After adding single IP coefficients power equation will become as following.

$$Power_{MIPS} = 0.27 + 0.039 TD - 0.0172.SP$$
(20)

After adding MIPS coefficients in power equation. Multicore MIPS equation will become as following.

Power 
$$_{Multicore} = 1.23 + 0.134$$
, TD - 0.108.SP (21)

Results of EWOA are compared with the model which use GA and our work out performed the previous model comparison given in the table 4.

| Sr. | Parameter             | Proposed model | Proposed model |
|-----|-----------------------|----------------|----------------|
| No. |                       | using EWOA     | using GA       |
| 1   | Number of simulations | 250            | 6250           |
| 2   | Simulation time       | 12.2           | 374            |
| 3   | Percentage error (%)  | 7.45           | 11.42          |

Table 4: Comparison of Proposed model with existing model.







Figure 3: Correlation between simulated and estimated powers for MIPS.

Graphs in figure 3 shows the correlation between simulated and estimated power of MIPS processor.



#### Figure 4: Correlation between simulated and estimated powers for Multi-Core MIPS.

Graphs in figure 4 shows the correlation between simulated and estimated power of multicores of MIPS processor. Random 25 sets are taken to validate the correlation and accuracy between simulated and estimated powers of Processor.

#### 6. Conclusion:

Early power estimation is very important for electronics system design as it offers a significant advantage to designers to adjust their power budgets and improve system reliability, leading to reduced turnaround times.

In this study, a sophisticated statistical macro-modeling technique for estimating power consumption using the Enhanced Whale Optimization Algorithm (EWOA) is introduced. This methodology involves generating random input patterns that provided to the digital test system to compute power usage. The average power consumption for the entire test setup is then derived from these computations.

We conducted experimental assessments comparing our statistical power macro-model with a commercial Electronic Design Automation (EDA) power simulator, both running at a 100 MHz frequency. To validate the accuracy of our model, we conducted a statistical error analysis, demonstrating its precision within an acceptable margin of error and its efficient computational time. This modeling approach proves to be suitable for commercial applications involving low-





power multicore processors.

#### 7. References:

- [1] Y. Nasser, J. Lorandel, J.-C. Prévotet, and M. Hélard, "RTL to transistor level power modeling and estimation techniques for FPGA and ASIC: A survey," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 40, no. 3, pp. 479-493, 2020.
- [2] G. Huang *et al.*, "Machine learning for electronic design automation: A survey," *ACM Transactions on Design Automation of Electronic Systems (TODAES)*, vol. 26, no. 5, pp. 1-46, 2021.
- [3] A. Alagarsamy, L. Gopalakrishnan, S. Mahilmaran, and S.-B. Ko, "A self-adaptive mapping approach for network on chip with low power consumption," *IEEE Access*, vol. 7, pp. 84066-84081, 2019.
- [4] F. Siddiq and Y. A. Durrani, "Efficient power macromodeling approach for an IP-based SoC system usingdiscrete water cycle algorithm," *Turkish Journal of Electrical Engineering and Computer Sciences*, vol. 29, no. 3, pp. 1368-1382, 2021.
- [5] A. V. Bhaskar and T. Venkatesh, "Performance analysis of network-on-chip in manycore processors," *Journal of Parallel and Distributed Computing*, vol. 147, pp. 196-208, 2021.
- [6] B. Halavar and B. Talawar, "Power and performance analysis of 3D network-on-chip architectures," *Computers & Electrical Engineering*, vol. 83, p. 106592, 2020.
- [7] S. Feng, Y. Yan, H. Li, L. Zhang, and S. Yang, "Thermal management of 3D chip with non-uniform hotspots by integrated gradient distribution annular-cavity micro-pin fins," *Applied Thermal Engineering*, vol. 182, p. 116132, 2021.
- [8] M. Ansari *et al.*, "Power-aware checkpointing for multicore embedded systems," *IEEE Transactions on Parallel and Distributed Systems*, vol. 33, no. 12, pp. 4410-4424, 2022.
- [9] M. Haataja, "Register-transfer level power estimation and reduction methodologies of digital system-on-chip building blocks," M. Haataja, 2016.
- [10] J. J. Justus *et al.*, "Hybridization of Metaheuristics Based Energy Efficient Scheduling Algorithm for Multi-Core Systems," *Computer Systems Science & Engineering*, vol. 44, no. 1, 2023.
- [11] M. H. Qais, H. M. Hasanien, and S. Alghuwainem, "Enhanced whale optimization algorithm for maximum power point tracking of variable-speed wind generators," *Applied soft computing*, vol. 86, p. 105937, 2020.
- [12] F. Siddiq and Y. A. Durrani, "Efficient power macromodeling approach for heterogeneously stacked 3d ICs using Bio-geography based optimization," *Plos one*, vol. 17, no. 2, p. e0264181, 2022.
- [13] Y. A. Durrani and T. Riesgo, "Power macromodeling technique and its application to SoC-based design," *International Journal of Numerical Modelling: Electronic Networks, Devices and Fields,* vol. 30, no. 6, p. e2207, 2017.
- [14] S. Chakraborty, A. K. Saha, S. Sharma, S. Mirjalili, and R. Chakraborty, "A novel enhanced whale optimization algorithm for global optimization," *Computers & Industrial Engineering*, vol. 153, p. 107086, 2021.
- [15] J. Zhang, M. Kong, G. Zhang, and Y. Huang, "Weapon-target assignment using a whale optimization algorithm," *International Journal of Computational Intelligence Systems*, vol. 16, no. 1, p. 62, 2023.